Analytics and Data Integration (Clinical and Pre-Clinical) Bioinformatics
Key expectations: Develop Predictive Models and Applicable Initiatives, data integration, analytics, machine learning for external and internal data sets, extract data, load data for downstream work - end to end solutions. Clinical data program analytics, processing clinical data sets, utilize for data visualization, data curation. Help the team understand their data.
Organization/Business Function Background
As part of the Data & Computational Science platform in the mRNA Center of Excellence, Research Data Science (RDS) is responsible for leading the application of statistics & advanced data analytics to drive data-driven, decision-making for the advancement of mRNA vaccines and therapeutics. In this context, the RDS team integrates complex and disparate data sources spanning preclinical to translational research, uncovers patterns in the data and develops machine learning models to predict mRNA vaccine performance and further our understanding of the mechanism of action. Familiarity with biology/immunology is preferred.
Scope Description
The project comprehensively integrates data collected from preclinical and clinical studies, encompassing diverse tests and high-dimensional data. This includes conducting advanced statistical analysis, data analytics, model development, and visualization. Additionally, it entails the creation of efficient and lightweight data science models, methods, or packages. The development responsibilities and outputs involve developing and constructing extensive bioinformatics workflows to ensure optimal performance, scalability, and reproducibility in the preprocessing, transforming, and analyzing substantial biological datasets, such as transcriptomics and flow cytometry.
Expected Benefit
The development & implementation of machine learning models to predict clinical performance based on preclinical data is essential for the selection and prioritization of the mRNA-based vaccines to advance to human trials. A core outcome of this work are workflows supporting objective and robust candidate decisions based on predicted immunogenicity and reactogenicity.
Work Product
Bioinformatics Workstreams Development:
- Design, develop, and optimize end-to-end data pipelines and workflows for preprocessing, transforming, and analyzing large-scale biological datasets (transcriptomics, flow cytometry), emphasizing reproducibility, scalability, and performance.
- Implement robust data processing components, integrate data quality checks, normalization, and feature engineering, and utilize Python, R, Bioconductor, and relevant bioinformatics toolkits.
Clinical Data Programming & Analysis:
- Expedite essential preprocessing, analysis, and visualization of clinical data by developing standardized workflows.
- Analyze cross-modal associations to determine correlations between exploratory endpoints and links to outcomes of immunogenicity/reactogenicity.
- Standardized graphics library covering exploratory endpoints (transcriptomics, flow cytometry, serology) and outcomes (immunogenicity, reactogenicity).
- Associated workflows with well-defined inputs and outputs, maximizing code and analysis reusability.
Data Curation & Exploratory Data Analysis:
- Integrate heterogeneous datasets (both internal & external) from multiple modalities (bulk/single-cell RNAseq, flow cytometry) across studies of interest.
- Develop preprocessing and analytical workflows per dataset.
- Present exploratory data analysis results to describe key dataset parameters.
- Derive features of interest (e.g., differential expression signatures, subject responders vs. non-responders) for cross-study analyses.
- Harmonize datasets for utilization in meta-analysis approaches.